Skip to content

sd: sync to master-593-3d6064b#2175

Merged
LostRuins merged 4 commits into
LostRuins:concedo_experimentalfrom
wbruna:kcpp_sd_update_202604_4
May 4, 2026
Merged

sd: sync to master-593-3d6064b#2175
LostRuins merged 4 commits into
LostRuins:concedo_experimentalfrom
wbruna:kcpp_sd_update_202604_4

Conversation

@wbruna
Copy link
Copy Markdown

@wbruna wbruna commented Apr 30, 2026

master-592-b8079e2 removes all build-time backend dependencies on sd.cpp sources.

The backend change is very incompatible with many ggml includes: I've had to remove util.h from sdtypes_adapter.cpp because it triggered weird errors on llama.h. A better solution could be splitting util.h into two headers, but I didn't want to change stuff outside otherarch/sdcpp in the middle of this PR. I've also had to adapt our main_gpu support, but didn't test it very well.

On the other hand, since sd.cpp became backend-agnostic, sdtypes_adapter.cpp now only needs to be built once for all backends. Both Vulkan and ROCm seem to be running fine with the change.

After #2155 . Edit: moving the discussion here to make it easier to follow.

@wbruna wbruna force-pushed the kcpp_sd_update_202604_4 branch from c1de647 to 9f2f7a4 Compare May 1, 2026 10:27
@wbruna wbruna marked this pull request as ready for review May 1, 2026 10:28
@wbruna
Copy link
Copy Markdown
Author

wbruna commented May 1, 2026

Copied from #2155:

oof, the backend change does not look very pleasant, even though I can see why you added it.

How exactly is backend determination supposed to be done now?

We first pick the device name with this:

__STATIC_INLINE__ std::string get_default_backend_name() {
    ggml_backend_load_all_once();
    // should pick the same backend as ggml_backend_init_best
    ggml_backend_dev_t dev = ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_GPU);
    dev                    = dev ? dev : ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_IGPU);
    dev                    = dev ? dev : ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_CPU);
    if (dev == nullptr) {
        return "";
    }
    return ggml_backend_dev_name(dev);
}

then initialize it by name.

because currently when selecting a desired backend (via compile flags for that target), the selected backend is expected to be used for everything (e.g. If i pick CPU, even though I might have cuda or vulkan, I expect to use CPU for TTS, LLM and image gen).

It still works like this: only two types of backend will be exposed by ggml at a time (Vulkan and CPU, ROCm and CPU, etc), so it'll prefer the first discrete GPU, then the first iGPU; and use it for everything (the flags --x-on-cpu still work the same way as before).

I do not want automatic backend determination from C++ side, kcpp backend selection should always be explicit (which means it must use pure CPU, or pure vulkan with a specified device if the user chooses it, even though I launched the cuda build with CUDA available and supported). I think since ggml is built with the expected flags this should be doable but I'm very unclear on how this auto works right now.

In the sense of, say, Vulkan0 versus Vulkan1, I think the selection has changed: I have a lot of scripts to convince everybody that my dGPU is at a specific slot (it often switches places with the iGPU at boot 😕), and sd.cpp now seems to correctly pick the dGPU regardless of them. But note: even if the previous algorithm was just "pick the first device", technically it was already the C++ layer choosing it (unless forced with main_gpu).

We will likely get more sophisticated selection soon (like "Vulkan1 for the diffusion, Vulkan0 for the VAE, CPU for the conditioner"), but we are not there yet: it's still the same backend for everything, except for --x-on-cpu.

I made a hackish adaptation for main_gpu to keep the interface working as-is, but I believe a more future-proof approach would be:

  • we add a function on sdtype_adapter that publishes to the Python layer the list of device names, together with useful metadata (e.g. if it is a dGPU, if it would be the C++ choice, etc);
  • we change the model loading interface to receive names for the diffusion model / clip / vae device.
  • sdtype_adapter passes the main device name directly to the default device selection at util.cpp (like the main_gpuadaptation I did).

This way, if the Python layer decides it wants to force a device, let C++ pick whatever, or even support fancy aliases like "iGPU", it can. In fact, we could even add support for placing models on different devices before upstream: the diff would be pretty simple at this point.

@wbruna
Copy link
Copy Markdown
Author

wbruna commented May 1, 2026

@LostRuins , I've just noticed this:

    bool load_model(const load_model_inputs inputs)
    {
        (...)
        std::string vulkan_info_raw = inputs.vulkan_info;
        std::string vulkan_info_str = "";
        for (size_t i = 0; i < vulkan_info_raw.length(); ++i) {
            vulkan_info_str += vulkan_info_raw[i];
            if (i < vulkan_info_raw.length() - 1) {
                vulkan_info_str += ",";
            }
        }
        const char* existingenv = getenv("GGML_VK_VISIBLE_DEVICES");
        if(!existingenv && vulkan_info_str!="")
        {
            vulkandeviceenv = "GGML_VK_VISIBLE_DEVICES="+vulkan_info_str;
            putenv((char*)vulkandeviceenv.c_str());
        }

GGML_VK_* env vars shouldn't be set here: ggml's behavior for image, TTS, etc. could change depending on the presence of a text model. Maybe it's better to move it to the Python layer, or to a general 'init' function?

@wbruna
Copy link
Copy Markdown
Author

wbruna commented May 1, 2026

And as far as I can tell, apart from main_gpu and global env vars, we currently don't have device selection for image gen: it's left to the upstream code to decide.

I've added device reporting to the sdtype_get_info function right before realizing I won't need it 😅 Selecting by number is actually easier. I'll keep the info support for now, just in case we want e.g. debug logging.

My plan is changing the "on_gpu" booleans at the C++ interface to receive the device index instead. So we'd have integers clip_gpu and vae_gpu, with:

  • -2: CPU (default for clip_gpu)
  • -1: for clip_gpu and vae_gpu, same as main_gpu (default for vae_gpu); for main_gpu, means "let sd.cpp choose";
  • >= 0: use that GPU number

I believe this will be enough for any behavior we need:

  • to avoid having the C++ layer choose the backend, the Python layer can simply set main_gpu;
  • things like "main SD device is CPU, with clip on GPU 1, and VAE on GPU 0" will already be supported at the C++ level (of course, we'd still need launcher and command-line support).

@LostRuins
Copy link
Copy Markdown
Owner

If you search GGML_VK_VISIBLE_DEVICES in koboldcpp you'll see it initialized in every single load function - because it's not known which modules (e.g. tts + video gen, or whatever) the user might wish to load, so I duplicated it. We could probably deduplicate it and have a unified pre-loader maybe...

@wbruna wbruna force-pushed the kcpp_sd_update_202604_4 branch 2 times, most recently from 1d5c5da to c002d36 Compare May 1, 2026 17:29
@wbruna
Copy link
Copy Markdown
Author

wbruna commented May 1, 2026

I've simplified the main_gpu handling. Should be good to go already; I can add the multi-device support afterwards.

About the GGML_VK_VISIBLE_DEVICES handling: any reason to keep doing it at the C++ level? I saw CUDA_VISIBLE_DEVICES and HIP_VISIBLE_DEVICES being set on on koboldcpp.py.

@LostRuins
Copy link
Copy Markdown
Owner

About the GGML_VK_VISIBLE_DEVICES handling: any reason to keep doing it at the C++ level? I saw CUDA_VISIBLE_DEVICES and HIP_VISIBLE_DEVICES being set on on koboldcpp.py.

CUDA_VISIBLE_DEVICES isn't actually read by koboldcpp code at all, rather it's handled by cuda itself during load (changing visibility before even reaching the ggml loader). Whereas for GGML_VK_VISIBLE_DEVICES it's read inside ggml_vk_instance_init(). I am not sure if setting it in python would work well, it probably could work too, but it's complicated by the fact that device overrides e.g. --device Vulkan2, Vulkan3 is also applied. Mostly I have not taken time to disentangle it all.

@wbruna
Copy link
Copy Markdown
Author

wbruna commented May 2, 2026

I am not sure if setting it in python would work well, it probably could work too, but it's complicated by the fact that device overrides e.g. --device Vulkan2, Vulkan3 is also applied. Mostly I have not taken time to disentangle it all.

Gave it a try: #2179 . I've notice the inputs.devices_override settings, but they seemed mostly independent.

@wbruna wbruna force-pushed the kcpp_sd_update_202604_4 branch from c002d36 to 3288741 Compare May 3, 2026 22:03
@LostRuins LostRuins merged commit 276c651 into LostRuins:concedo_experimental May 4, 2026
@LostRuins
Copy link
Copy Markdown
Owner

turns out its shadowing caused by __STATIC_INLINE__ ggml_backend_reg_t ggml_backend_cpu_reg() conflicting with an earlier definition of ggml_backend_cpu_reg declared as extern, really unfortunate that sd.cpp used the same function name there. That basically means that llama.h can never be included in anything that also uses ggml_extend.

I can sort of work around it by making a separate utils header safe for sdcpp and put the llm related functions elsewhere.

@LostRuins
Copy link
Copy Markdown
Owner

Not the best solution but i did this: 950676f

ideally sd.cpp should still use different function names but since its probably bad to change upstream code unless we have to, this is the best we can do probably...

@wbruna
Copy link
Copy Markdown
Author

wbruna commented May 4, 2026

turns out its shadowing caused by __STATIC_INLINE__ ggml_backend_reg_t ggml_backend_cpu_reg() conflicting with an earlier definition of ggml_backend_cpu_reg declared as extern, really unfortunate that sd.cpp used the same function name there. That basically means that llama.h can never be included in anything that also uses ggml_extend.

Yeah... As I understand it, it was intentional, to be able to load even the cpu backend as optional. But that should be changed in ggml itself.

Not the best solution but i did this: 950676f

Looks good (I wouldn't even create a separate llmutils.cpp, but maybe it also helps keeping it organized).

What's the reason we do it this way, including the source files in the sdtype_adapter itself instead of building them separately? A normal split build would mostly eliminate these kind of issues.

ideally sd.cpp should still use different function names but since its probably bad to change upstream code unless we have to, this is the best we can do probably...

I'll bring this issue upstream. Code-wise, I don't think that header needs to be globally exposed to work; and we could also mass-rename the symbols to get them out of the way. Unfortunately, this doesn't affect the public interface, so it may be tricky to argue about why is it needed.

@LostRuins
Copy link
Copy Markdown
Owner

What's the reason we do it this way, including the source files in the sdtype_adapter itself instead of building them separately? A normal split build would mostly eliminate these kind of issues.

mainly because its a huge pain to juggle the individual object files in the makefile, since we build for many different targets. Each compilation unit except the final one can only consist of a single .cpp file, so you end up having many object files catered to each target (which gets worse if there's build specific flags, then you have someclass_vulkan_noavx2.o + someclass_vulkan_noavx.o etc... which also prevents an easy glob of all the .cpp files (since each target requires different files and different flags)

CMake solves that problem (it's a lot easier with cmakelists) but adds a new problem where cmake is a dependency, and also does not support easily building for multiple targets at once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants